Tutorial

This tutorial shows the functionalities of Extremes.jl. They are illustrated by reproducing some of the results shown by Coles (2001) in An Introduction to Statistical Modeling of Extreme Values.

Before executing this tutorial, make sure to have installed the following packages:

  • Extremes.jl (of course)
  • DataFrames.jl (for using the DataFrame type)
  • Distributions.jl (for using probability distribution objects)
  • Gadfly.jl (for plotting)

and import them using the following command:

julia> using Extremes, Dates, DataFrames, Distributions, Gadfly

Model for stationary block maxima

The stationary BlockMaxima model is illustrated using the annual maximum sea-levels recorded at Port Pirie in South Australia from 1923 to 1987, studied by Coles (2001) in Chapter 3.

The Extremes.jl package supports maximum likelihood inference, Bayesian inference and inference based on the probability weigthed moments. For the GEV parameter estimation, the following functions can be used:

  • gevfitpwm: estimation with probability weighted moments;
  • gevfit: estimation with maximum likelihood;
  • gevfitbayes: estimation with the Bayesian method.

These functions return a fittedEVA type that can be used by all the other functions presented in this tutorial. The parameters estimates are contained in the field θ̂ of this structure.

Log-scale paremeter

These functions return the estimate of the log-scale parameter $\phi = \log \sigma$.

In this example, the data are contained in a DataFrame. The fit functions can be called using the DataFrame as the first argument and the data column symbol as the second argument.

Load the data

Loading the annual maximum sea-levels at Port Pirie:

data = load("portpirie")
first(data,5)

5 rows × 2 columns

YearSeaLevel
Int64Float64
119234.03
219243.83
319253.65
419263.88
519274.01

The loaded data are contained in a Dataframe. The annual maxima can be shown as a function of the year using the Gadfly package:

set_default_plot_size(12cm, 8cm)
plot(data, x=:Year, y=:SeaLevel, Geom.line)
Year 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000 2020 2040 2060 2080 2100 1840 1845 1850 1855 1860 1865 1870 1875 1880 1885 1890 1895 1900 1905 1910 1915 1920 1925 1930 1935 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020 2025 2030 2035 2040 2045 2050 2055 2060 2065 2070 2075 2080 1800 1900 2000 2100 1840 1845 1850 1855 1860 1865 1870 1875 1880 1885 1890 1895 1900 1905 1910 1915 1920 1925 1930 1935 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020 2025 2030 2035 2040 2045 2050 2055 2060 2065 2070 2075 2080 h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 2.00 2.05 2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50 2.55 2.60 2.65 2.70 2.75 2.80 2.85 2.90 2.95 3.00 3.05 3.10 3.15 3.20 3.25 3.30 3.35 3.40 3.45 3.50 3.55 3.60 3.65 3.70 3.75 3.80 3.85 3.90 3.95 4.00 4.05 4.10 4.15 4.20 4.25 4.30 4.35 4.40 4.45 4.50 4.55 4.60 4.65 4.70 4.75 4.80 4.85 4.90 4.95 5.00 5.05 5.10 5.15 5.20 5.25 5.30 5.35 5.40 5.45 5.50 5.55 5.60 5.65 5.70 5.75 5.80 5.85 5.90 5.95 6.00 6.05 6.10 6.15 6.20 6.25 6.30 6.35 6.40 6.45 6.50 0 2 4 6 8 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 SeaLevel

Maximum likelihood inference

GEV parameters estimation

The GEV parameter estimation with maximum likelihood is performed with the gevfit function:

julia> fm = gevfit(data, :SeaLevel)
MaximumLikelihoodEVA
model :
	BlockMaxima
	data :		Array{Float64,1}[65]
	location :	μ ~ 1
	logscale :	ϕ ~ 1
	shape :		ξ ~ 1

θ̂  :	[3.874750223091266, -1.6192723640210762, -0.05010719929448139]

The vector of the parameter estimates $\hat\mathbf{\theta} = (μ̂,\, ϕ̂,\, ξ̂)^\top$ is contained in the field θ̂ of the structure fm:<fittedEVA.

The approximate covariance matrix of the parameter estimates can be obtained with the parametervar function:

julia> parametervar(fm)
3×3 Array{Float64,2}:
  0.000780204   0.000995016  -0.0010741
  0.000995016   0.0104541    -0.00392576
 -0.0010741    -0.00392576    0.00965404

Confidence intervals on the parameter estimates can be obtained with the cint function:

julia> cint(fm)
3-element Array{Array{Float64,1},1}:
 [3.820004234825991, 3.929496211356541]
 [-1.819669858589598, -1.4188748694525544]
 [-0.24268345866324337, 0.14246906007428056]

Diagnostic plots

Several diagnostic plots for assessing the accuracy of the GEV model fitted to the Port Pirie data are can be shown with the diagnosticplots function:

set_default_plot_size(21cm ,16cm)
diagnosticplots(fm)
Data 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 0 2 4 6 8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 -2.0 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 -2 0 2 4 -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 Density Density Plot Return Period 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 10-2.0 10-1.9 10-1.8 10-1.7 10-1.6 10-1.5 10-1.4 10-1.3 10-1.2 10-1.1 10-1.0 10-0.9 10-0.8 10-0.7 10-0.6 10-0.5 10-0.4 10-0.3 10-0.2 10-0.1 100.0 100.1 100.2 100.3 100.4 100.5 100.6 100.7 100.8 100.9 101.0 101.1 101.2 101.3 101.4 101.5 101.6 101.7 101.8 101.9 102.0 102.1 102.2 102.3 102.4 102.5 102.6 102.7 102.8 102.9 103.0 103.1 103.2 103.3 103.4 103.5 103.6 103.7 103.8 103.9 104.0 10-2 100 102 104 10-2.0 10-1.8 10-1.6 10-1.4 10-1.2 10-1.0 10-0.8 10-0.6 10-0.4 10-0.2 100.0 100.2 100.4 100.6 100.8 101.0 101.2 101.4 101.6 101.8 102.0 102.2 102.4 102.6 102.8 103.0 103.2 103.4 103.6 103.8 104.0 h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 2.00 2.05 2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50 2.55 2.60 2.65 2.70 2.75 2.80 2.85 2.90 2.95 3.00 3.05 3.10 3.15 3.20 3.25 3.30 3.35 3.40 3.45 3.50 3.55 3.60 3.65 3.70 3.75 3.80 3.85 3.90 3.95 4.00 4.05 4.10 4.15 4.20 4.25 4.30 4.35 4.40 4.45 4.50 4.55 4.60 4.65 4.70 4.75 4.80 4.85 4.90 4.95 5.00 5.05 5.10 5.15 5.20 5.25 5.30 5.35 5.40 5.45 5.50 5.55 5.60 5.65 5.70 5.75 5.80 5.85 5.90 5.95 6.00 6.05 6.10 6.15 6.20 6.25 6.30 6.35 6.40 6.45 6.50 0 2 4 6 8 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 Return Level Return Level Plot Model 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 2.00 2.05 2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50 2.55 2.60 2.65 2.70 2.75 2.80 2.85 2.90 2.95 3.00 3.05 3.10 3.15 3.20 3.25 3.30 3.35 3.40 3.45 3.50 3.55 3.60 3.65 3.70 3.75 3.80 3.85 3.90 3.95 4.00 4.05 4.10 4.15 4.20 4.25 4.30 4.35 4.40 4.45 4.50 4.55 4.60 4.65 4.70 4.75 4.80 4.85 4.90 4.95 5.00 5.05 5.10 5.15 5.20 5.25 5.30 5.35 5.40 5.45 5.50 5.55 5.60 5.65 5.70 5.75 5.80 5.85 5.90 5.95 6.00 6.05 6.10 6.15 6.20 6.25 6.30 6.35 6.40 6.45 6.50 0 2 4 6 8 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 2.00 2.05 2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50 2.55 2.60 2.65 2.70 2.75 2.80 2.85 2.90 2.95 3.00 3.05 3.10 3.15 3.20 3.25 3.30 3.35 3.40 3.45 3.50 3.55 3.60 3.65 3.70 3.75 3.80 3.85 3.90 3.95 4.00 4.05 4.10 4.15 4.20 4.25 4.30 4.35 4.40 4.45 4.50 4.55 4.60 4.65 4.70 4.75 4.80 4.85 4.90 4.95 5.00 5.05 5.10 5.15 5.20 5.25 5.30 5.35 5.40 5.45 5.50 5.55 5.60 5.65 5.70 5.75 5.80 5.85 5.90 5.95 6.00 6.05 6.10 6.15 6.20 6.25 6.30 6.35 6.40 6.45 6.50 0 2 4 6 8 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 Empirical Quantile Plot Model -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 -1.00 -0.95 -0.90 -0.85 -0.80 -0.75 -0.70 -0.65 -0.60 -0.55 -0.50 -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00 -1 0 1 2 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 -1.00 -0.95 -0.90 -0.85 -0.80 -0.75 -0.70 -0.65 -0.60 -0.55 -0.50 -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00 -1 0 1 2 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 Empirical Probability Plot

The diagnostic plots consist in the probability plot (upper left panel), quantile plot (upper right panel), return level plot (lower left panel) and the density plot (lower right panel). These plots can be displayed separately using respectively the functions probplot, qqplot, returnlevelplot and histplot.

Return level estimation

T-year return level estimate can be obtained using the function returnlevel on a fittedEVA object. The first argument is the fitted model, the second is the return period in years and the last one is the confidence level for computing the confidence interval.

For example, the 100-year return level for the Port Pirie block maxima model and the corresponding 95% confidence interval can be estimated with this commands:

julia> r = returnlevel(fm, 100, .95)
ReturnLevel
fittedmodel :
		MaximumLikelihoodEVA
		model :
			BlockMaxima
			data :		Array{Float64,1}[65]
			location :	μ ~ 1
			logscale :	ϕ ~ 1
			shape :		ξ ~ 1

		θ̂  :	[3.874750223091266, -1.6192723640210762, -0.05010719929448139]
returnperiod :	100
value :		Array{Float64,1}[1]
cint :		Array{Array{Float64,1},1}[1]

where the return value can be accessed with

julia> r.value
1-element Array{Float64,1}:
 4.688403360432851

and where the corresponding confidence interval can be accessed with

julia> r.cint
1-element Array{Array{Float64,1},1}:
 [4.377121171613511, 4.999685549252191]
Type-stable function

In this example of a stationary model, the function returns a unit dimension vector for the return level and a vector containing only one vector for the confidence interval. The reason is that the function always returns the same type in the stationary and non-stationary case. The function is therefore type-stable allowing better performance of code execution.

To get the scalar return level in the stationary case, the following command can be used:

julia> r.value[]
4.688403360432851

To get the scalar confidence interval in the stationary case, the following command can be used:

julia> r.cint[]
2-element Array{Float64,1}:
 4.377121171613511
 4.999685549252191

Bayesian Inference

GEV parameters estimation

The GEV parameter estimation with the Bayesian method is performed with the gevfitbayes function:

julia> fm = gevfitbayes(data, :SeaLevel)

Progress:   0%|                                         |  ETA: 0:20:24
Progress:   9%|███▋                                     |  ETA: 0:00:06
Progress:  14%|█████▊                                   |  ETA: 0:00:04
Progress:  19%|███████▊                                 |  ETA: 0:00:03
Progress:  24%|█████████▉                               |  ETA: 0:00:03
Progress:  29%|████████████                             |  ETA: 0:00:02
Progress:  33%|█████████████▍                           |  ETA: 0:00:02
Progress:  38%|███████████████▋                         |  ETA: 0:00:02
Progress:  40%|████████████████▍                        |  ETA: 0:00:02
Progress:  45%|██████████████████▋                      |  ETA: 0:00:02
Progress:  50%|████████████████████▋                    |  ETA: 0:00:01
Progress:  55%|██████████████████████▋                  |  ETA: 0:00:01
Progress:  59%|████████████████████████▏                |  ETA: 0:00:01
Progress:  64%|██████████████████████████▍              |  ETA: 0:00:01
Progress:  69%|████████████████████████████▌            |  ETA: 0:00:01
Progress:  75%|██████████████████████████████▋          |  ETA: 0:00:01
Progress:  80%|████████████████████████████████▋        |  ETA: 0:00:01
Progress:  83%|██████████████████████████████████▎      |  ETA: 0:00:00
Progress:  88%|████████████████████████████████████▎    |  ETA: 0:00:00
Progress:  94%|██████████████████████████████████████▍  |  ETA: 0:00:00
Progress:  99%|████████████████████████████████████████▍|  ETA: 0:00:00
Progress: 100%|█████████████████████████████████████████| Time: 0:00:02
BayesianEVA
model :
	BlockMaxima
	data :		Array{Float64,1}[65]
	location :	μ ~ 1
	logscale :	ϕ ~ 1
	shape :		ξ ~ 1

sim :
	Mamba.Chains
	Iterations :		2001:5000
	Thinning interval :	1
	Chains :		1
	Samples per chain :	3000
	Value :			Array{Float64,3}[3000,3,1]
Prior

Currently, only the improper uniform prior is implemented, i.e. \[ f_{(μ,ϕ,ξ)}(μ,ϕ,ξ) ∝ 1. \] It yields to a proper posterior as long as the sample size is larger than 3 (Northrop and Attalides, 2016).

Sampling scheme

Currently, the No-U-Turn Sampler extension (Hoffman and Gelman, 2014) to Hamiltonian Monte Carlo (Neel, 2011, Chapter 5) is implemented for simulating an autocorrelated sample from the posterior distribution.

The approximate covariance matrix of the parameter estimates can be obtained with the parametervar function:

julia> parametervar(fm)
3×3 Array{Float64,2}:
  0.000852665   0.00106937  -0.00117325
  0.00106937    0.0111913   -0.00406731
 -0.00117325   -0.00406731   0.010271

Confidence intervals on the parameter estimates can be obtained with the cint function:

julia> cint(fm)
3-element Array{Array{Float64,1},1}:
 [3.815728600356368, 3.9285930389938737]
 [-1.7885411295725389, -1.3855365071668206]
 [-0.1942955022409137, 0.1983585965446626]

Diagnostic plots

Several diagnostic plots for assessing the accuracy of the GEV model fitted to the Port Pirie data are can be shown with the diagnosticplotsfunction:

set_default_plot_size(21cm ,16cm)
diagnosticplots(fm)

The diagnostic plots consist in the probability plot (upper left panel), quantile plot (upper right panel), return level plot (lower left panel) and the density plot (lower right panel). These plots can be displayed separately using respectively the functions probplot, qqplot, returnlevelplot and histplot.

Return level estimation

T-year return level estimate can be obtained using the function returnlevel on a fittedEVA object. The first argument is the fitted model, the second is the return period in years and the last one is the confidence level for computing the confidence interval.

For example, the 100-year return level for the Port Pirie block maxima model and the corresponding 95% confidence interval can be estimated with this commands:

julia> r = returnlevel(fm, 100, .95)
ReturnLevel
fittedmodel :
		BayesianEVA
		model :
			BlockMaxima
			data :		Array{Float64,1}[65]
			location :	μ ~ 1
			logscale :	ϕ ~ 1
			shape :		ξ ~ 1

		sim :
			Mamba.Chains
			Iterations :		2001:5000
			Thinning interval :	1
			Chains :		1
			Samples per chain :	3000
			Value :			Array{Float64,3}[3000,3,1]
returnperiod :	100
value :		Array{Float64,1}[1]
cint :		Array{Array{Float64,1},1}[1]

where the return value can be accessed with

julia> r.value
1-element Array{Float64,1}:
 4.785574470258748

and where the corresponding confidence interval can be accessed with

julia> r.cint
1-element Array{Array{Float64,1},1}:
 [4.517764996731947, 5.286960479468266]
Type-stable function

In this example of a stationary model, the function returns a unit dimension vector for the return level and a vector containing only one vector for the confidence interval. The reason is that the function always returns the same type in the stationary and non-stationary case. The function is therefore type-stable allowing better performance of code execution.

To get the scalar return level in the stationary case, the following command can be used:

julia> r.value[]
4.785574470258748

To get the scalar confidence interval in the stationary case, the following command can be used:

julia> r.cint[]
2-element Array{Float64,1}:
 4.517764996731947
 5.286960479468266

Inference based on the probability weighted moments

GEV parameters estimation

The parameter estimation with the probability weighted moments method is performed with the gevfitpwm function:

julia> fm = gevfitpwm(data, :SeaLevel)
pwmEVA
model :
	BlockMaxima
	data :		Array{Float64,1}[65]
	location :	μ ~ 1
	logscale :	ϕ ~ 1
	shape :		ξ ~ 1

θ̂  :	[3.8731723562720766, -1.5932320395836068, -0.051477125862911276]

The approximate covariance matrix of the parameter estimates using a bootstrap procedure can be obtained with the parametervar function:

julia> parametervar(fm)
3×3 Array{Float64,2}:
  0.000851892   0.00111211  -0.00104506
  0.00111211    0.01017     -0.00369442
 -0.00104506   -0.00369442   0.0071561

Confidence intervals on the parameter estimates using a bootstrap procedure can be obtained with the cint function:

julia> cint(fm)
3-element Array{Array{Float64,1},1}:
 [3.82196149828531, 3.9339824796591647]
 [-1.818123611627562, -1.413472826223069]
 [-0.22922706015687963, 0.08795530153021101]

Diagnostic plots

Several diagnostic plots for assessing the accuracy of the GEV model fitted to the Port Pirie data are can be shown with the diagnosticplotsfunction:

set_default_plot_size(21cm ,16cm)
diagnosticplots(fm)

The diagnostic plots consist in the probability plot (upper left panel), quantile plot (upper right panel), return level plot (lower left panel) and the density plot (lower right panel). These plots can be displayed separately using respectively the functions probplot, qqplot, returnlevelplot and histplot.

Return level estimation

T-year return level estimate can be obtained using the function returnlevel on a fittedEVA object. The first argument is the fitted model, the second is the return period in years and the last one is the confidence level for computing the confidence interval.

For example, the 100-year return level for the Port Pirie block maxima model and the corresponding 95% confidence interval can be estimated with this commands:

julia> r = returnlevel(fm, 100, .95)
ReturnLevel
fittedmodel :
		pwmEVA
		model :
			BlockMaxima
			data :		Array{Float64,1}[65]
			location :	μ ~ 1
			logscale :	ϕ ~ 1
			shape :		ξ ~ 1

		θ̂  :	[3.8731723562720766, -1.5932320395836068, -0.051477125862911276]
returnperiod :	100
value :		Array{Float64,1}[1]
cint :		Array{Array{Float64,1},1}[1]

where the return value can be accessed with

julia> r.value
1-element Array{Float64,1}:
 4.705766363418328

and where the corresponding confidence interval can be accessed with

julia> r.cint
1-element Array{Array{Float64,1},1}:
 [4.453676163116104, 4.949109267044802]
Type-stable function

In this example of a stationary model, the function returns a unit dimension vector for the return level and a vector containing only one vector for the confidence interval. The reason is that the function always returns the same type in the stationary and non-stationary case. The function is therefore type-stable allowing better performance of code execution.

To get the scalar return level in the stationary case, the following command can be used:

julia> r.value[]
4.705766363418328

To get the scalar confidence interval in the stationary case, the following command can be used:

julia> r.cint[]
2-element Array{Float64,1}:
 4.453676163116104
 4.949109267044802

Model for stationary threshold exceedances

The stationary ThresholdExceedance model is illustrated using the daily rainfall accumulations at a location in south-west England from 1914 to 1962. This dataset was studied by Coles (2001) in Chapter 4.

Load the data

Loading the daily rainfall at a location in South-England:

data = load("rain")
first(data,5)

5 rows × 2 columns

DateRainfall
Date…Float64
11914-01-010.0
21914-01-022.3
31914-01-031.3
41914-01-046.9
51914-01-054.6

Plotting the data using the Gadfly package:

set_default_plot_size(14cm ,8cm)
plot(data, x=:Date, y=:Rainfall, Geom.point, Theme(discrete_highlight_color=c->nothing))
Date Jan 1, 1840 1850 1860 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 2020 2030 2040 Jan 1, 1800 1850 1900 1950 2000 2050 Jan 1, 1800 1850 1900 1950 2000 2050 Jan 1, 1800 1850 1900 1950 2000 2050 h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -125 -100 -75 -50 -25 0 25 50 75 100 125 150 175 200 225 -100 -95 -90 -85 -80 -75 -70 -65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175 180 185 190 195 200 -100 0 100 200 -100 -90 -80 -70 -60 -50 -40 -30 -20 -10 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 Rainfall

Threshold selection

TODO

GPD parameters estimation

Let's first identify the threshold exceedances:

threshold = 30.0
df = filter(row -> row.Rainfall > threshold, data)
first(df, 5)

5 rows × 2 columns

DateRainfall
Date…Float64
11914-02-0731.8
21914-03-0832.5
31914-12-1731.8
41914-12-3044.5
51915-02-1330.5

Get the exceedances above the threshold:

df[!,:Rainfall] =  df[!,:Rainfall] .- threshold
rename!(df, :Rainfall => :Exceedance)
first(df, 5)

5 rows × 2 columns

DateExceedance
Date…Float64
11914-02-071.8
21914-03-082.5
31914-12-171.8
41914-12-3014.5
51915-02-130.5

GP parameters estimation with probability weighted moments

The GP parameter estimation with probability weighted moments is performed as follows:

julia> fm = gpfitpwm(df, :Exceedance)
pwmEVA
model :
	ThresholdExceedance
	data :		Array{Float64,1}[152]
	logscale :	ϕ ~ 1
	shape :		ξ ~ 1

θ̂  :	[1.9877399514951732, 0.19651587232938317]

The approximate covariance matrix of the parameter estimates can be obtained with the function parametervar:

julia> parametervar(fm)
2×2 Array{Float64,2}:
  0.0160088   -0.00638048
 -0.00638048   0.00656376

GP parameters estimation with maximum likelihood

The GP parameter estimation with maximum likelihood is performed as follows:

julia> fm = gpfit(df, :Exceedance)
MaximumLikelihoodEVA
model :
	ThresholdExceedance
	data :		Array{Float64,1}[152]
	logscale :	ϕ ~ 1
	shape :		ξ ~ 1

θ̂  :	[2.006896498380506, 0.1844926991237574]

The approximate covariance matrix of the parameter estimates can be obtained with the function parametervar:

julia> parametervar(fm)
2×2 Array{Float64,2}:
  0.0165972   -0.00880429
 -0.00880429   0.0102416

GP parameters estimation with the Bayesian method

The GP parameter estimation with the Bayesian method is performed as follows:

julia> fm = gpfitbayes(df, :Exceedance)

Progress:   4%|█▌                                       |  ETA: 0:00:03
Progress:  12%|████▉                                    |  ETA: 0:00:01
Progress:  20%|████████▎                                |  ETA: 0:00:01
Progress:  28%|███████████▋                             |  ETA: 0:00:01
Progress:  36%|██████████████▊                          |  ETA: 0:00:01
Progress:  44%|██████████████████                       |  ETA: 0:00:01
Progress:  49%|████████████████████▎                    |  ETA: 0:00:01
Progress:  58%|███████████████████████▉                 |  ETA: 0:00:01
Progress:  66%|███████████████████████████▎             |  ETA: 0:00:00
Progress:  74%|██████████████████████████████▌          |  ETA: 0:00:00
Progress:  83%|█████████████████████████████████▉       |  ETA: 0:00:00
Progress:  90%|████████████████████████████████████▉    |  ETA: 0:00:00
Progress:  98%|████████████████████████████████████████▎|  ETA: 0:00:00
Progress: 100%|█████████████████████████████████████████| Time: 0:00:01
BayesianEVA
model :
	ThresholdExceedance
	data :		Array{Float64,1}[152]
	logscale :	ϕ ~ 1
	shape :		ξ ~ 1

sim :
	Mamba.Chains
	Iterations :		2001:5000
	Thinning interval :	1
	Chains :		1
	Samples per chain :	3000
	Value :			Array{Float64,3}[3000,2,1]
Prior

Currently, only the improper uniform prior is implemented, i.e. \[ f_{(ϕ,ξ)}(ϕ,ξ) ∝ 1. \] It yields to a proper posterior as long as the sample size is larger than 2 (Northrop and Attalides, 2016).

Sampling scheme

Currently, the No-U-Turn Sampler extension (Hoffman and Gelman, 2014) to Hamiltonian Monte Carlo (Neel, 2011, Chapter 5) is implemented for simulating an autocorrelated sample from the posterior distribution.

The approximate covariance matrix of the parameter estimates can be obtained with the function parametervar:

julia> parametervar(fm)
2×2 Array{Float64,2}:
  0.0173363   -0.00928924
 -0.00928924   0.0112485

Return level estimation

With the ThresholdExceedance structure, the returnlevel function requires several arguments to calculate the T-year return level:

  • the threshold value;
  • the number of total observation (below and above the threshold);
  • the number of observations per year;
  • the return period T;
  • the confidence level for computing the confidence interval.

The function uses the Peaks-Over-Threshold model definition (Coles, 2001, Chapter 4) for computing the T-year return level.

For the rainfall example, the 100-year return level can be estimated as follows:

fm = gpfit(df, :Exceedance)
r = returnlevel(fm, threshold, size(data,1), 365, 100, .95)
ReturnLevel
fittedmodel :
		MaximumLikelihoodEVA
		model :
			ThresholdExceedance
			data :		Array{Float64,1}[152]
			logscale :	ϕ ~ 1
			shape :		ξ ~ 1

		θ̂  :	[2.006896498380506, 0.1844926991237574]
returnperiod :	100
value :		Array{Float64,1}[1]
cint :		Array{Array{Float64,1},1}[1]

where the value can be accessed with

julia> r.value
1-element Array{Float64,1}:
 106.32558691303024

and where the corresponding confidence interval can be accessed with

julia> r.cint
1-element Array{Array{Float64,1},1}:
 [65.48163774428642, 147.16953608177405]
Note

In this example of a stationary model, the function returns a unit dimension vector for the return level and a vector containing only one vector for the confidence interval. The reason is that the function always returns the same type in the stationary and non-stationary case. The function is therefore type-stable allowing better performance of code execution.

To get the scalar return level in the stationary case, the following command can be used:

julia> r.value[]
106.32558691303024

To get the scalar confidence interval in the stationary case, the following command can be used:

julia> r.cint[]
2-element Array{Float64,1}:
  65.48163774428642
 147.16953608177405

Probability weighted moments estimation

Probability weighted moments estimation of the GEV parameters can also be performed by using the gevfitpwm function. All the methods also apply to the pwmEVA object.

julia> fm = gpfitpwm(df, :Exceedance)
pwmEVA
model :
	ThresholdExceedance
	data :		Array{Float64,1}[152]
	logscale :	ϕ ~ 1
	shape :		ξ ~ 1

θ̂  :	[1.9877399514951732, 0.19651587232938317]

Bayesian estimation

Bayesian estimation of the GEV parameters can also be performed by using the gevfitbayes function. All the methods also apply to the `BayesianEVA object.

julia> fm = gpfitbayes(df, :Exceedance)

Progress:   8%|███▏                                     |  ETA: 0:00:01
Progress:  15%|██████▎                                  |  ETA: 0:00:01
Progress:  24%|█████████▊                               |  ETA: 0:00:01
Progress:  32%|█████████████▎                           |  ETA: 0:00:01
Progress:  41%|████████████████▊                        |  ETA: 0:00:01
Progress:  49%|████████████████████▏                    |  ETA: 0:00:01
Progress:  57%|███████████████████████▌                 |  ETA: 0:00:01
Progress:  63%|█████████████████████████▉               |  ETA: 0:00:00
Progress:  71%|█████████████████████████████▏           |  ETA: 0:00:00
Progress:  80%|████████████████████████████████▋        |  ETA: 0:00:00
Progress:  88%|████████████████████████████████████     |  ETA: 0:00:00
Progress:  96%|███████████████████████████████████████▍ |  ETA: 0:00:00
Progress: 100%|█████████████████████████████████████████| Time: 0:00:01
BayesianEVA
model :
	ThresholdExceedance
	data :		Array{Float64,1}[152]
	logscale :	ϕ ~ 1
	shape :		ξ ~ 1

sim :
	Mamba.Chains
	Iterations :		2001:5000
	Thinning interval :	1
	Chains :		1
	Samples per chain :	3000
	Value :			Array{Float64,3}[3000,2,1]

Model for dependent data

The stationary ThresholdExceedance model is illustrated using the daily rainfall accumulations at a location in south-west England from 1914 to 1962. This dataset was studied by Coles (2001) in Chapter 4.

Load the data

Loading the daily rainfall at a location in South-England:

data = load("wooster")
first(data,5)

5 rows × 2 columns

DateTemperature
Date…Int64
11983-01-0123
21983-01-0229
31983-01-0319
41983-01-0414
51983-01-0527

Plotting the data using the Gadfly package:

plot(data, x=:Date, y=:Temperature, Geom.point, Theme(discrete_highlight_color=c->nothing))
Date Jan 1, 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 Jan 1, 1975 1980 1985 1990 1995 Jan 1, 1975 1980 1985 1990 1995 Jan 1, 1975 1980 1985 1990 1995 h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -150 -125 -100 -75 -50 -25 0 25 50 75 100 125 150 175 200 -125 -120 -115 -110 -105 -100 -95 -90 -85 -80 -75 -70 -65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175 -200 -100 0 100 200 -130 -120 -110 -100 -90 -80 -70 -60 -50 -40 -30 -20 -10 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 Temperature
df = copy(data)
df[!,:Temperature] = -data[:,:Temperature]
filter!(row -> month(row.Date) ∈ (1,2,11,12), df)
plot(df, x=:Date, y=:Temperature, Geom.point)
Date Jan 1, 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 Jan 1, 1975 1980 1985 1990 1995 Jan 1, 1975 1980 1985 1990 1995 Jan 1, 1975 1980 1985 1990 1995 h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -160 -140 -120 -100 -80 -60 -40 -20 0 20 40 60 80 100 120 -140 -135 -130 -125 -120 -115 -110 -105 -100 -95 -90 -85 -80 -75 -70 -65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 -200 -100 0 100 -140 -135 -130 -125 -120 -115 -110 -105 -100 -95 -90 -85 -80 -75 -70 -65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Temperature

Declustering the threshold exceedances

threshold = -10
cluster = getcluster(df[:,:Temperature], -10, runlength=4)
julia> typeof(cluster)
Array{Cluster,1}

GPD parameters estimation

Let's first identify the threshold exceedances:

threshold = 30.0
df = filter(row -> row.Rainfall > threshold, data)
first(df, 5)

5 rows × 2 columns

DateRainfall
Date…Float64
11914-02-0731.8
21914-03-0832.5
31914-12-1731.8
41914-12-3044.5
51915-02-1330.5

Get the exceedances above the threshold:

df[!,:Rainfall] =  df[!,:Rainfall] .- threshold
rename!(df, :Rainfall => :Exceedance)
first(df, 5)

5 rows × 2 columns

DateExceedance
Date…Float64
11914-02-071.8
21914-03-082.5
31914-12-171.8
41914-12-3014.5
51915-02-130.5

Generalized Pareto parameter estimation by maximum likelihood:

julia> fm = gpfit(df, :Exceedance)
MaximumLikelihoodEVA
model :
	ThresholdExceedance
	data :		Array{Float64,1}[152]
	logscale :	ϕ ~ 1
	shape :		ξ ~ 1

θ̂  :	[2.006896498380506, 0.1844926991237574]
Note

The function returns the estimates of the log-scale parameter $\phi = \log \sigma$.

Return level estimation

With the ThresholdExceedance structure, the returnlevel function requires several arguments to calculate the T-year return level:

  • the threshold value;
  • the number of total observation (below and above the threshold);
  • the number of observations per year;
  • the return period T;
  • the confidence level for computing the confidence interval.

The function uses the Peaks-Over-Threshold model definition (Coles, 2001, Chapter 4) for computing the T-year return level.

For the rainfall example, the 100-year return level can be estimated as follows:

julia> r = returnlevel(fm, threshold, size(data,1), 365, 100, .95)
ReturnLevel
fittedmodel :
		MaximumLikelihoodEVA
		model :
			ThresholdExceedance
			data :		Array{Float64,1}[152]
			logscale :	ϕ ~ 1
			shape :		ξ ~ 1

		θ̂  :	[2.006896498380506, 0.1844926991237574]
returnperiod :	100
value :		Array{Float64,1}[1]
cint :		Array{Array{Float64,1},1}[1]

where the value can be accessed with

julia> r.value
1-element Array{Float64,1}:
 106.32558691303024

and where the corresponding confidence interval can be accessed with

julia> r.cint
1-element Array{Array{Float64,1},1}:
 [65.48163774428642, 147.16953608177405]
Note

In this example of a stationary model, the function returns a unit dimension vector for the return level and a vector containing only one vector for the confidence interval. The reason is that the function always returns the same type in the stationary and non-stationary case. The function is therefore type-stable allowing better performance of code execution.

To get the scalar return level in the stationary case, the following command can be used:

julia> r.value[]
106.32558691303024

To get the scalar confidence interval in the stationary case, the following command can be used:

julia> r.cint[]
2-element Array{Float64,1}:
  65.48163774428642
 147.16953608177405

Probability weighted moments estimation

Probability weighted moments estimation of the GEV parameters can also be performed by using the gevfitpwm function. All the methods also apply to the pwmEVA object.

julia> fm = gpfitpwm(df, :Exceedance)
pwmEVA
model :
	ThresholdExceedance
	data :		Array{Float64,1}[152]
	logscale :	ϕ ~ 1
	shape :		ξ ~ 1

θ̂  :	[1.9877399514951732, 0.19651587232938317]

Bayesian estimation

Bayesian estimation of the GEV parameters can also be performed by using the gevfitbayes function. All the methods also apply to the `BayesianEVA object.

julia> fm = gpfitbayes(df, :Exceedance)

Progress:   8%|███▍                                     |  ETA: 0:00:01
Progress:  16%|██████▋                                  |  ETA: 0:00:01
Progress:  25%|██████████▏                              |  ETA: 0:00:01
Progress:  31%|████████████▊                            |  ETA: 0:00:01
Progress:  40%|████████████████▍                        |  ETA: 0:00:01
Progress:  48%|███████████████████▋                     |  ETA: 0:00:01
Progress:  57%|███████████████████████▎                 |  ETA: 0:00:01
Progress:  65%|██████████████████████████▋              |  ETA: 0:00:00
Progress:  73%|██████████████████████████████           |  ETA: 0:00:00
Progress:  79%|████████████████████████████████▌        |  ETA: 0:00:00
Progress:  88%|████████████████████████████████████▎    |  ETA: 0:00:00
Progress:  97%|███████████████████████████████████████▊ |  ETA: 0:00:00
Progress: 100%|█████████████████████████████████████████| Time: 0:00:01
BayesianEVA
model :
	ThresholdExceedance
	data :		Array{Float64,1}[152]
	logscale :	ϕ ~ 1
	shape :		ξ ~ 1

sim :
	Mamba.Chains
	Iterations :		2001:5000
	Thinning interval :	1
	Chains :		1
	Samples per chain :	3000
	Value :			Array{Float64,3}[3000,2,1]

Model for non-stationary data

Coles(2001, Chapter 6)